Introduction

Column {.data-width = 600}

What is Simulated Annealing (SA)

  • Simulated annealing is a probabilistic method (metaheuristic) for approximating the global optimum of a function. It is similar to previously covered algorithms, such as Metropolis-Hastings and Stochastic Gradient Descent. The main advantage, and difference, is the possibility of SA to accept false solutions.

  • You may seem wary about accepting false solutions, but there is a an important clarification between exploitation and exploration in regards to an algorithm.

    • Exploitation is more of a greedy approach, in which we are always accepting a better answer. For example, in gradient descent we are always going in the direction of greatest incline/decline. This is problematic in the sense that you can easily get stuck in local minima. A simple multimodal function is difficult to solve using gradient descent and is extremely dependent on your initial guess.

    • Exploration on the other hand, refers to the notion of exploring the solution space. Solution spaces are massive, and it is sometimes difficult to explore the whole space. SA by no means explores the whole solution space but attempts to do this by accepting false solutions and then exploring off of those points. SA is, therefore, more exploratory in the sense that even because it starts in a certain local minima, there is the possibility (very likely) that the algorithm will step out of it.



Column

The Motivation

  • The purpose of my project was to conduct research on the logic of a specific metaheuristic and to develop a versatile implementation of it using the R programming language. My objectives were to:

    1. Acquire a thorough understanding of the theoretical foundations of the metaheuristic.

    2. Create a flexible and adaptable script that can be applied to a range of optimization and regression problems.

    3. Utilize the script developed in the previous objective to analyze and solve a variety of problems within the domains of optimization and regression.

    4. Evaluate and summarize the performance of the metaheuristic in solving the problems examined.

Other Topics Covered in class

The following are other topics covered in our class :

  • Ordinary Least Squares Regression
  • Multiple Regression
  • ANOVA ~ Co-linearity emphasis, effects plots, overparameterization
  • Variable selection methods ~ Coeff of Determination, F-Model Comparison Tests, Principal Component Analysis, Information Criterion ~ AIC & BIC, Forward/backwards step wise, LASSO
  • Regularization ~ lasso, ridge, and elastic net
  • Transformation ~ Box-Cox, Yeo-Johnson, Additive Models
  • Cross-Validation
  • Bootstrapping
  • GLMs ~ General Linear Models
  • Leverage v Influence v Outlier ~ Cook’s D, Outlier Tests
  • Bayesian Regression ~ Monte-Carlo Integration, Rejection Sampling
  • Nonlinear Regression ~ polynomial, splines
  • Optimization - Gauss-Newton Method, Gradient Descent, Stochastic Gradient Descent

Algorithm

Column

Outline ~ Basis

  • Generate an initial Candidate solution, let’s call it current_soln (this can be user specified or randomly generated)
  • Set/get an initial Temperature T, where T > 0
    • for i in 1 : N, where N is the number of iterations
    • Sample £ ~ g(£) where g is a symmetric distribution (UF, normal, etc.)
    • Propose a new candidate function random_soln = current_soln +- £
    • Calculate probability p = exp (- Δf/T_i), where f is the objective function and Δf = f(random_soln) - f(current_soln)
    • Then, we accept the candidate solution with probability p, where u ~ U(0,1) and accept current_soln = random_soln if u ≤ p
  • Cooling: update the temperature where T = T α, where 0 < α < 1, and continue running until T < Final_temp (user-defined)



simulated_annealing <- function(func, temp, final_temp, number_of_paramters, alpha, N) {
  
  current_soln <- runif(number_of_paramters, -20, 20)
  
  while (temp > final_temp) {
    for (i in 1:N) {
      random_soln <- rnorm(number_of_paramters, current_soln, 1)
      delta <- func(random_soln) - func(current_soln)

      if (delta < 0) {
        current_soln <- random_soln
      } else {
        p <- exp(-delta / temp)
        r <- runif(1, 0, 1)
        if (r < p) {
          current_soln <- random_soln
        }
      }

      if (abs(func(current_soln)) < best_eval) {
        best_temp <- temp
        iter_num <- i
        best_param <- current_soln
        best_eval <- func(best_param)
      }
    }

    # update temp
    temp <- temp * alpha
   
  }
  return(list("Parameters" = best_param, "fval" = best_eval, "Iteration" = iter_num, "Temperature" = best_temp, "loop_number" = number_of_loops))
}

User Input Variables

  • My algorithm attempts to run simulated annealing on a problem structure with the following user-input arguments
  1. func
    • func should be the objective function, or the function we are trying to minimize.
    • For example, in regression we want to minimize the Residual sum of squares. In a regression problem, this would be our func.
  2. temp
    • temp is the value of our temperature. A higher temperature will allow for more exploration of the solution space. As temperature decreases or cools, the solutions are more concentration since our probability of accepting a false solution decrease.
    • Think about temp’s effect on p = exp (- Δf/T_i)…. We accept with higher probabilities the smaller p is.
  3. final_temp
    • final_temp is significant in that it determines how many times the inner loop runs.
    • We continue to run the script until temp = final_temp, where temp is decreased by temp *alpha^k=final_temp, where alpha [0,1]
  4. number_of_parameters
    • number_of_parameters is how many variables we are solving for.
    • This is important for the generation of the current_soln, the length(current_soln) == number_of_parameters. It is also used to calculate our random_soln s.
  5. alpha
    • alpha is the cooling agent of the algorithm.
    • It is typically between [0,1] and when used in conjunction with our cooling procedure by temp *alpha^k=final_temp, alpha is typically between [.8, .9].
  6. N
    • N is the number of iterations we want the inner loop to run per temperature.
    • A higher number of N will lead to more solutions checking and compounded exploration. It will also increase the run time of the program.
    • If we are looking for more exploration and a better solution, increase N, and the same goes for temp.

Customization/Problem-Depenendent

  1. Calculating neighboring solutions
    • The technique I use is taking our neighboring solution to be a variable that is normally distributed with mean = current_soln and sd = 1. This can be completely altered and is up to the programmer. For example, a different neighboring function could shift by a fixed number of steps. Anything that could be implemented to create new, random_solns would be acceptable…. For example, customization of the sd in my function.

    • Neighboring solutions also depends on the problem structure. For example, in the Travelling Salesman Problem I have it implemented such that the neighboring function reverses two elements, such that x[i ,j] = [j, i]. This is entirely dependent on the problem structure, as a neighboring function like the one above makes no sense in this problem.

    • Again, there are further customizations. For a specific problem and after we have come up with our neighboring function, we can further customize it. Suppose instead of generating one random_soln we generate say 10, and then proceed on with the random_soln with the lowest objective function evaluation.

    • This is an area of real importance and has greatly affects our best solution.

  2. alpha values and cooling procedure
    • This provides a list of many different cooling schedules for SA. The way in which we cool is a further customization that may prove to increase accuracy. The schedule we use is the exponential multiplicative cooling, by temp *alpha^k=final_temp.

    • Other examples include linear multiplicative next_temp = (initial_temp)/(1+alpha*k) , where alpha > 0 There are plenty of other examples and the study hints that non-monotonic adaptive cooling procedures led to the lowest objective function evaluations.

    • I didn’t have the chance to explore and analyze the changes based on cooling procedures and continue on with the exponential multiplicative cooling.

Applications

Column

Ackley function

  • The Ackley function is implemented as follows
ackley <- function(xx, a = 20, b = 0.2, c = 2 * pi) {
  d <- length(xx)
  
  sum1 <- sum(xx^2)
  sum2 <- sum(cos(c * xx))
  
  term1 <- -a * exp(-b * sqrt(sum1 / d))
  term2 <- -exp(sum2 / d)
  
  y <- term1 + term2 + a + exp(1)
  return(y)
}


  • The function itself is interesting as it has loads of local minima and is therefore used to test optimization methods.
  • It has a global minimum at {x} = 0, f({x}) = 0 and can be run for any number of x
Results
  • For running the ackley function with one unknown parameter :
    • x = 3.13 *10^(-5) & objective function evaluation = 0.00013
    • Not exactly 0 but our x and objective function evaluation are sufficiently small

Beale function

  • The Beale function is implemented as follows
beale <- function(xx) {
  x1 <- xx[1]
  x2 <- xx[2]

  term1 <- (1.5 - x1 + x1 * x2)^2
  term2 <- (2.25 - x1 + x1 * x2^2)^2
  term3 <- (2.625 - x1 + x1 * x2^3)^2

  y <- term1 + term2 + term3
  return(y)
}


  • Similar to the ackley function, the beale function itself is interesting as it has loads of local minima and is therefore used to test optimization methods.
  • It has a global minimum at (x,y) = (3, .5) where f(x,y) = 0.
Results
  • Results as follows for f(x,y) :
    • x, y = 3.012, .504 & objective function evaluation =6.8 *10^(-5)
    • Not exactly (3, .5) but this has to do with rounding errors, and the objective function evaluation is sufficiently small

OLS Connection

  • To better connect to the theme of this class, I applied SA to a simple OLS problem.

  • Using the data set alr4::Heights, I am interested in the linear model between predictor variable dheight(daughters height) on the response variable, mheight (mothers height)

  • The objective function will simply be the Residual sum of squares, denoted as the following:

    func1 <- function(xx) {
  y <- 0
  b1 <- xx[1]
  b2 <- xx[2]
  for (i in 1:length(mheight)) {
    y <- y + sum((mheight[i] - (b1+dheight[i] * b2))^2)
  }
  return(y)
}
  • If we have any theory or knowledge of the subject, we would expect SA estimation of b1,b2 to be exactly the same as R’s built in linear model lm(), as this is a convex problem.
Results


  • And as we can see, the results are identical.

  • I used this problem as an example of the potential and appeal of simulated annealing. It can be applied to other, harder to calculate, regression problems such as Nonlinear Least Squares Problems. I use this example to connect back to the class and reiterate the functionality of SA.

Travelling Salesman Problem

Column

The Problem Itself

  • The TSP aims to answer the following : “Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city”

  • The problem is NP-hard, meaning that it is computationally difficult to find an exact solution for large sets of cities. However, there are approximate algorithms that can be used to fine near-optimal solutions & is often used as a benchmark for optimization algorithms.

  • The problem itself may not seem very applicable, but it has many real-world applications, such as

    • Vehicle routing: deliveries, etc.
    • Bioinformatics ~ used to compare the evolutionary distance between various speciies by comparing the order of genes in the genome
    • etc.
  • The problem is a very old and famous problem… and can essentially be boiled down in the following graphic :

Objective function & neighboring function

Objective func

  • Before diving into the objective func, I load the data into the following structure :

  • In this case, I load uniformly, randomly distributed random variables x & y

# Number of cities
n <- 20

# Max x & y coordinates
max_x <- 1
max_y <- 1

# load df and create a distance matrix
cities <- data.frame(id = 1:n, x = runif(n, max = max_x), y = runif(n, max = max_y))
  • The objective is relatively simple in this case, as it is simply the eucledian distance function. It can be implemented as the following
distance_func <- function(x1, x2) {
  return(sqrt( (cities$x[x1] - cities$x[x2])^2 + (cities$y[x1] - cities$y[x2])^2 ) ) 
}


# Minimizing function
obj_func <- function(path) {
  total_d = 0
  # where path is the current route
  for (i in 1:n ) {
    if (i <= n - 1) {
      total_d = total_d + distance_func(path[i], path[i +1] )                      
    } else{
      total_d = total_d + distance_func(path[i], path[1])
    } 
  }
return(total_d)
}
  • This is not the most efficient way for R to compute the distance and to speed it up, in following problems a distance_matrix will be used. Then we can call distance_matrix[i,j] and it will return an already computed distance between cities i and j. It is faster and better situated for the problem.

Neighborhood-Function

  • In summary, a neighborhood function generates new solutions that are similar to the current solution but slightly different. These are then used to explore the search space of the optimization algorithm.
  • In the the context of the Traveling Salesman Problem, a neighboring function generates new solutions by swapping n cities in the current route. n can be adjust and is typically increased when the problem increases in the number of cities.
  • In our examples, I will use n=2 and thus swapping two cities in the current route (excluding the origin city)
neighbourhood = function(list1){
  random = sample(c(2:length(cities$id)), 2)
  i = random[1]
  j = random[2]
  
  # swap city IDs if and only if they are different
  if (list1[i] != list1[j]) {
    list1[c(i,j)] = list1[c(j,i)] 
  }
  
return(list1)
}

Parallelization

  • Parallelization is the process of breaking down a computational task into smaller chunks that can be executed simultaneously by multiple cores or processors, thereby increasing computational power.

  • As I only have a laptop with 4 processors, the parallelization did not dramatically change my run time but I implemented in the hopes of a new PC showing up on my doorstep….. please?

  • I implemented it using the function ‘mclapply()’ in the parallel package, and the following is the simulated_annealing function :

simulated_annealing <- function(func, temp, final_temp, origin_city_id, alpha, N) {
  
  # create a random path and then switch the initial index, with origin_city_id
  current_soln <- sample(c(1:n), n)
  current_soln[c(1, match(origin_city_id, current_soln))] <- current_soln[c(match(origin_city_id, current_soln), 1)] 
  
  best_eval <- func(current_soln)
  number_of_loops <- 0
  evals <- c()
  
  while (temp > final_temp) {
    # Set up a cluster with the desired number of cores
    cl <- makeCluster(detectCores(2))
    best_param = current_soln
    
    
    #Export the following to the cluster (allows worker nodes to find them)
    clusterExport(cl, 'distance_matrix')
    clusterExport(cl, 'n')
    clusterExport(cl, 'neighbourhood')
    
    # Use parLapply to parallelize the inner loop
    result <- parLapply(cl, 1:N, function(i) {
      random_soln <- neighbourhood(current_soln)
      delta <- func(random_soln) - func(current_soln) 
      
      if (delta < 0) {
        current_soln <- random_soln
      } else {
        p <- exp(-1 * delta / temp)
        r <- runif(1, 0, 1)
        if (r < p) {
          current_soln <- random_soln
        }
      }
      
      if (abs(func(current_soln)) < abs(best_eval)) {
        best_temp <- temp
        iter_num <- i
        best_param <- current_soln
        best_eval <- func(best_param)
      }
      
      # Return the current solution at the end of each iteration
      return(list(current_soln = current_soln, best_param = best_param, best_eval = best_eval))
    })
    
    # Update the current solution with the solution from the last iteration
    current_soln <- result[[length(result)]]$current_soln
    best_param <- result[[length(result)]]$best_param
    best_eval <- result[[length(result)]]$best_eval
    iter_num  <- result[[length(result)]]$iter_num
    
    
    
    # Stop the cluster and release the resources
    stopCluster(cl)
    
    
    
    # update temp
    cat("Current solution: ", current_soln, "\n")
    cat("Current evaluation function value: ", func(current_soln), "\n")
    print(paste("The current temp is : ", round(temp)))
    temp <- temp * alpha
    number_of_loops <- number_of_loops + 1
  }
  return(list("Parameters" = best_param, "fval" = best_eval, 'origin_city' = origin_city_id))
  
}

10 Cities

  • In the following example, I will use randomly generated city locations. Namely, using randomly, uniformly generated data points. Namely, the problem structure solves the follow :
n <- 10
max_x <- 1
max_y <- 1

# load df and create a distance matrix
cities <- data.frame(id = 1:n, x = runif(n, max = max_x), y = runif(n, max = max_y))
  • A data set with larger n will be analyzed as well, but this is good to check functionality and debug. The actual problem structure and code can be found in ‘TSP SA Algorithm’, that contains the exact code.

Results

  • As we can see, it looks as though SA has chosen the path correctly.

Larger problems & Complications

  • For larger problems, such as n > 20, my algorithm does not perform well. As mentioned before, there are many tweakable, problem-dependent components such as :
    1. Initial Temperature
    2. Cooling Schedule
    3. Acceptance Probability
    4. Number of Iterations, N
    5. Distance function (Euclidean, Manhattan, Hamming, etc.)
    6. Neighborhood function and Initial Solution
  • Personally, I believe 6) to be the biggest and, and with my limited computing power, most influential on the accuracy of the solution. Therefore, I will be continuing on with customizing 6), noting the importance of the tweaking and adjustments of all the rest 1-5.

Neighborhood Function and Initial Solution

  • Some possibilities for the neighborhood include :
    • swap (which I have used up to this point)
    • Insert techniques
    • Scramble techniques
    • Inversion
    • etc.
  • Here is an example of some methods that are worth exploring :
  • Due to time constraints, this is where my project will have to stop. I plan to pick it up and explore other neighborhood functions, as the accuracy for more difficult problems is not where I want it to be.

TSP, SA Algorithm

simulated_annealing <- function(func, temp, final_temp, origin_city_id, alpha, N) {
  
  # create a random path and then switch the initial index, with origin_city_id
  
  current_soln <- sample(c(1:length(cities$id)), length(cities$id))
  current_soln[c(1, match(origin_city_id, current_soln))] <- current_soln[c(match(origin_city_id, current_soln), 1)] 
  
  best_eval <- func(current_soln)
  number_of_loops <- 0
  evals <- c()
  
  while (temp > final_temp) {
    for (i in 1:N) {
      random_soln <- neighbourhood(current_soln)
      delta <- func(random_soln) - func(current_soln) 

      if (delta < 0) {
        current_soln <- random_soln
      } else {
        p <- exp(-1 * delta / temp)
        r <- runif(1, 0, 1)
        if (r < p) {
          current_soln <- random_soln
        }
      }

      if (abs(func(current_soln)) < abs(best_eval)) {
        best_temp <- temp
        iter_num <- i
        best_param <- current_soln
        best_eval <- func(best_param)
      }
    } # end of inner loop

    # update temp
    cat("Current solution: ", current_soln, "\n")
    cat("Current evaluation function value: ", func(current_soln), "\n")
    print(paste("The current temp is : ", round(temp)))
    temp <- temp * alpha
    number_of_loops <- number_of_loops + 1
  }
  return(list("Parameters" = best_param, "fval" = best_eval, "Iteration" = iter_num, "Temperature" = best_temp, "loop_number" = number_of_loops))
}


About Me

---
title: "Simulated Annealing"
author: "Reed Shay"
output: 
  flexdashboard::flex_dashboard:
    theme:
      bg: "#101010"
      fg: "#FDF7F7" 
      primary: "#ED79F9"
      base_font:
        google: Prompt
      code_font:
        google: JetBrains Mono
    orientation: columns
    vertical_layout: fill
    social: ["facebook", "twitter", "linkedin"]
    source_code: embed
---
<style>
  h1 {
    font-size: 36px;
    text-align: center;
  }
  h3{
  font-size: 30px;
  font-wieght: bold;
}
  h4{
  font-size: 28px;
}

  p {
    font-size: 22px;
    color : white;
    line-height: 1.5;
    text-align: justify;
  }

  li {
  font-size: 20px;
  color : white;
    line-height:1.5;
    text-align: justify;
}

  ul{
  font-size:24px;
   color : white;
    line-height:1.5;
    text-align: justify;
}
</style>



```{r, include=FALSE}
# chunk option dev="svg" produces very large vector graphics files
knitr::opts_chunk$set(dev="svg")
# chunk option dev="png" is the default raster graphics format for HTML output
knitr::opts_chunk$set(dev="png")

suppressWarnings(library(tidyverse))
suppressWarnings(library(stats))
suppressWarnings(library(plotly))
suppressWarnings(library(plyr))
suppressWarnings(library(flexdashboard))  

```

# Introduction

## Column {.data-width = 600}

### What is Simulated Annealing (SA)

-   Simulated annealing is a probabilistic method (metaheuristic) for approximating the global optimum of a function. It is similar to previously covered algorithms, such as Metropolis-Hastings and Stochastic Gradient Descent. The main advantage, and difference, is the possibility of SA to accept false solutions.

-   You may seem wary about accepting false solutions, but there is a an important clarification between exploitation and exploration in regards to an algorithm.

    -   Exploitation is more of a greedy approach, in which we are always accepting a better answer. For example, in gradient descent we are always going in the direction of greatest incline/decline. This is problematic in the sense that you can easily get stuck in local minima. A simple multimodal function is difficult to solve using gradient descent and is extremely dependent on your initial guess.

    -   Exploration on the other hand, refers to the notion of exploring the solution space. Solution spaces are massive, and it is sometimes difficult to explore the whole space. SA by no means explores the whole solution space but attempts to do this by accepting false solutions and then exploring off of those points. SA is, therefore, more exploratory in the sense that even because it starts in a certain local minima, there is the possibility (very likely) that the algorithm will step out of it.

<br>

<br>

<center>![](https://ieatbugsforbreakfast.files.wordpress.com/2011/10/sa_jumpbehaviour.png)</center>

## Column {.tabset data-width="400"}

### The Motivation

-   The purpose of my project was to conduct research on the logic of a specific metaheuristic and to develop a versatile implementation of it using the R programming language. My objectives were to:

    1.  Acquire a thorough understanding of the theoretical foundations of the metaheuristic.

    2.  Create a flexible and adaptable script that can be applied to a range of optimization and regression problems.

    3.  Utilize the script developed in the previous objective to analyze and solve a variety of problems within the domains of optimization and regression.

    4.  Evaluate and summarize the performance of the metaheuristic in solving the problems examined.

### Other Topics Covered in class

The following are other topics covered in our class :

-   Ordinary Least Squares Regression
-   Multiple Regression
-   ANOVA \~ Co-linearity emphasis, effects plots, overparameterization
-   Variable selection methods \~ Coeff of Determination, F-Model Comparison Tests, Principal Component Analysis, Information Criterion \~ AIC & BIC, Forward/backwards step wise, LASSO
-   Regularization \~ lasso, ridge, and elastic net
-   Transformation \~ Box-Cox, Yeo-Johnson, Additive Models
-   Cross-Validation
-   Bootstrapping
-   GLMs \~ General Linear Models
-   Leverage v Influence v Outlier \~ Cook's D, Outlier Tests
-   Bayesian Regression \~ Monte-Carlo Integration, Rejection Sampling
-   Nonlinear Regression \~ polynomial, splines
-   Optimization - Gauss-Newton Method, Gradient Descent, Stochastic Gradient Descent

# Algorithm

## Column {.tabset data-width="400,"}

### Outline \~ Basis

-   Generate an initial Candidate solution, let's call it current_soln (this can be user specified or randomly generated)
-   Set/get an initial Temperature T, where T \> 0
    -   for i in 1 : N, where N is the number of iterations
    -   Sample £ \~ g(£) where g is a symmetric distribution (UF, normal, etc.)
    -   Propose a new candidate function random_soln = current_soln +- £
    -   Calculate probability p = exp (- Δf/T_i), where f is the objective function and Δf = f(random_soln) - f(current_soln)
    -   Then, we accept the candidate solution with probability p, where u \~ U(0,1) and accept current_soln = random_soln if u ≤ p
-   Cooling: update the temperature where T = T α, where 0 \< α \< 1, and continue running until T \< Final_temp (user-defined)

<br>

```{=tex}
\begin{center}
 
 In R it follows: 

\end{center}
```
<br>

    simulated_annealing <- function(func, temp, final_temp, number_of_paramters, alpha, N) {
      
      current_soln <- runif(number_of_paramters, -20, 20)
      
      while (temp > final_temp) {
        for (i in 1:N) {
          random_soln <- rnorm(number_of_paramters, current_soln, 1)
          delta <- func(random_soln) - func(current_soln)

          if (delta < 0) {
            current_soln <- random_soln
          } else {
            p <- exp(-delta / temp)
            r <- runif(1, 0, 1)
            if (r < p) {
              current_soln <- random_soln
            }
          }

          if (abs(func(current_soln)) < best_eval) {
            best_temp <- temp
            iter_num <- i
            best_param <- current_soln
            best_eval <- func(best_param)
          }
        }

        # update temp
        temp <- temp * alpha
       
      }
      return(list("Parameters" = best_param, "fval" = best_eval, "Iteration" = iter_num, "Temperature" = best_temp, "loop_number" = number_of_loops))
    }

### User Input Variables

-   My algorithm attempts to run simulated annealing on a problem structure with the following user-input arguments

1.  **func**
    -   func should be the objective function, or the function we are trying to minimize.\
    -   For example, in regression we want to minimize the Residual sum of squares. In a regression problem, this would be our func.
2.  **temp**
    -   temp is the value of our temperature. A higher temperature will allow for more exploration of the solution space. As temperature decreases or cools, the solutions are more concentration since our probability of accepting a false solution decrease.\
    -   Think about temp's effect on p = exp (- Δf/T_i).... We accept with higher probabilities the smaller p is.
3.  **final_temp**
    -   final_temp is significant in that it determines how many times the inner loop runs.\
    -   We continue to run the script until temp = final_temp, where temp is decreased by temp \*alpha\^k=final_temp, where alpha [0,1]
4.  **number_of_parameters**
    -   number_of_parameters is how many variables we are solving for.\
    -   This is important for the generation of the current_soln, the length(current_soln) == number_of_parameters. It is also used to calculate our random_soln s.
5.  **alpha**
    -   alpha is the cooling agent of the algorithm.\
    -   It is typically between [0,1] and when used in conjunction with our cooling procedure by temp \*alpha\^k=final_temp, alpha is typically between [.8, .9].
6.  **N**
    -   N is the number of iterations we want the inner loop to run per temperature.\
    -   A higher number of N will lead to more solutions checking and compounded exploration. It will also increase the run time of the program.\
    -   If we are looking for more exploration and a better solution, increase N, and the same goes for temp.

### Customization/Problem-Depenendent

1.  **Calculating neighboring solutions**
    -   The technique I use is taking our neighboring solution to be a variable that is normally distributed with mean = current_soln and sd = 1. This can be completely altered and is up to the programmer. For example, a different neighboring function could shift by a fixed number of steps. Anything that could be implemented to create new, random_solns would be acceptable.... For example, customization of the sd in my function.

    -   Neighboring solutions also depends on the problem structure. For example, in the Travelling Salesman Problem I have it implemented such that the neighboring function reverses two elements, such that x[i ,j] = [j, i]. This is entirely dependent on the problem structure, as a neighboring function like the one above makes no sense in this problem.

    -   Again, there are further customizations. For a specific problem and after we have come up with our neighboring function, we can further customize it. Suppose instead of generating one random_soln we generate say 10, and then proceed on with the random_soln with the lowest objective function evaluation.

    -   This is an area of real importance and has greatly affects our best solution.
2.  **alpha values and cooling procedure**
    -   This provides a list of many different cooling schedules for SA. The way in which we cool is a further customization that may prove to increase accuracy. The schedule we use is the exponential multiplicative cooling, by temp \*alpha\^k=final_temp.

    -   Other examples include linear multiplicative next_temp = (initial_temp)/(1+alpha\*k) , where alpha \> 0 There are plenty of other examples and the study hints that non-monotonic adaptive cooling procedures led to the lowest objective function evaluations.

    -   I didn't have the chance to explore and analyze the changes based on cooling procedures and continue on with the exponential multiplicative cooling.

# Applications

## Column {.tabset data-width="400"}

### Ackley function

-   The Ackley function is implemented as follows

```{=html}
<!-- -->
```
    ackley <- function(xx, a = 20, b = 0.2, c = 2 * pi) {
      d <- length(xx)
      
      sum1 <- sum(xx^2)
      sum2 <- sum(cos(c * xx))
      
      term1 <- -a * exp(-b * sqrt(sum1 / d))
      term2 <- -exp(sum2 / d)
      
      y <- term1 + term2 + a + exp(1)
      return(y)
    }

<br>

-   The function itself is interesting as it has loads of local minima and is therefore used to test optimization methods.
-   It has a global minimum at {x} = 0, f({x}) = 0 and can be run for any number of x

##### Results

-   For running the ackley function with one unknown parameter :
    -   x = 3.13 \*10\^(-5) & objective function evaluation = 0.00013
    -   Not exactly 0 but our x and objective function evaluation are sufficiently small

```{r}
library(plotly)
ackley <- function(xx, a = 20, b = 0.2, c = 2 * pi) {
  d <- length(xx)
  
  sum1 <- sum(xx^2)
  sum2 <- sum(cos(c * xx))
  
  term1 <- -a * exp(-b * sqrt(sum1 / d))
  term2 <- -exp(sum2 / d)
  
  y <- term1 + term2 + a + exp(1)
  return(y)
}





x <- seq(-30, 30, length.out = 50)
y <- seq(-30, 30, length.out = 50)
z <- as.matrix(sapply(x, function(x) sapply(y, function(y) ackley(c(x,y)))))
p <- plot_ly(x = x, y = y, z = z, type = "surface", colorscale = "Viridis") %>%
  layout(title = "Ackley Function 3D Surface Plot",
         paper_bgcolor = "rgba(0,0,0,0)",
         plot_bgcolor = "rgba(0,0,0,0)")
p

```

### Beale function

-   The Beale function is implemented as follows

```{=html}
<!-- -->
```
    beale <- function(xx) {
      x1 <- xx[1]
      x2 <- xx[2]

      term1 <- (1.5 - x1 + x1 * x2)^2
      term2 <- (2.25 - x1 + x1 * x2^2)^2
      term3 <- (2.625 - x1 + x1 * x2^3)^2

      y <- term1 + term2 + term3
      return(y)
    }

<br>

-   Similar to the ackley function, the beale function itself is interesting as it has loads of local minima and is therefore used to test optimization methods.
-   It has a global minimum at (x,y) = (3, .5) where f(x,y) = 0.

##### Results

-   Results as follows for f(x,y) :
    -   x, y = 3.012, .504 & objective function evaluation =6.8 \*10\^(-5)
    -   Not exactly (3, .5) but this has to do with rounding errors, and the objective function evaluation is sufficiently small

```{r}

beale <- function(xx) {
  x1 <- xx[1]
  x2 <- xx[2]

  term1 <- (1.5 - x1 + x1 * x2)^2
  term2 <- (2.25 - x1 + x1 * x2^2)^2
  term3 <- (2.625 - x1 + x1 * x2^3)^2

  y <- term1 + term2 + term3
  return(y)
}


x <- seq(-30, 30, length.out = 50)
y <- seq(-30, 30, length.out = 50)
z <- as.matrix(sapply(x, function(x) sapply(y, function(y) beale(c(x,y)))))
p <- plot_ly(x = x, y = y, z = z, type = "surface", colorscale = "Viridis") %>%
  layout(title = "Beale Function 3D Surface Plot",
         paper_bgcolor = "rgba(0,0,0,0)",
         plot_bgcolor = "rgba(0,0,0,0)")
p

```

### OLS Connection

-   To better connect to the theme of this class, I applied SA to a simple OLS problem.

-   Using the data set alr4::Heights, I am interested in the linear model between predictor variable dheight(daughters height) on the response variable, mheight (mothers height)

-   The objective function will simply be the Residual sum of squares, denoted as the following:

```
    func1 <- function(xx) {
  y <- 0
  b1 <- xx[1]
  b2 <- xx[2]
  for (i in 1:length(mheight)) {
    y <- y + sum((mheight[i] - (b1+dheight[i] * b2))^2)
  }
  return(y)
}
```

-   If we have any theory or knowledge of the subject, we would expect SA estimation of b1,b2 to be exactly the same as R's built in linear model lm(), as this is a convex problem.

##### Results

```{r, include = FALSE}
library(alr4)
attach(Heights)
fit <- lm(mheight ~ dheight)
summary(fit)

func1 <- function(xx) {
  y <- 0
  b1 <- xx[1]
  b2 <- xx[2]
  for (i in 1:length(mheight)) {
    y <- y + sum((mheight[i] - (b1+dheight[i] * b2))^2)
  }
  return(y)
}

simulated_annealing <- function(func, temp, final_temp, number_of_paramters, alpha, N) {
  current_soln <- runif(number_of_paramters, -20, 20)
  best_eval <- 10000000
  number_of_loops <- 0
  evals <- c()
  while (temp > final_temp) {
    for (i in 1:N) {
      random_soln <- rnorm(number_of_paramters, current_soln, 1)
      delta <- func(random_soln) - func(current_soln)

      if (delta < 0) {
        current_soln <- random_soln
      } else {
        p <- exp(-delta / temp)
        r <- runif(1, 0, 1)
        if (r < p) {
          current_soln <- random_soln
        }
      }

      if (abs(func(current_soln)) < best_eval) {
        best_temp <- temp
        iter_num <- i
        best_param <- current_soln
        best_eval <- func(best_param)
      }
    }

    # update temp
    temp <- temp * alpha
    number_of_loops <- number_of_loops + 1
  }
  return(list("Parameters" = best_param, "fval" = best_eval, "Iteration" = iter_num, "Temperature" = best_temp, "loop_number" = number_of_loops))
}


beta <- simulated_annealing(func1, 4000, 1, 2, .8, 1000)

# Prepare for plotting
ydata <- rep(NA, 1375)
for (i in 1:1375) {
  ydata[i] <- beta$Parameters[1] + dheight[i] * beta$Parameters[2]
}


```

<br>

<center>

```{r}
# and plot
# Create data frames for both plots
df1 <- data.frame(mheight, ydata)
df2 <- data.frame(mheight, fit$fitted.values)

# Create the first plot using ggplot
p1 <- ggplot(df1, aes(x=ydata, y=mheight)) + 
  geom_point() + 
  ggtitle("Simulated Annealing")

# Create the second plot using ggplot
p2 <- ggplot(df2, aes(x=fit$fitted.values, y=mheight)) + 
  geom_point() + 
  ggtitle("OLS")

# Use the `gridExtra` library to plot both plots side by side
library(gridExtra)
grid.arrange(p1, p2, ncol=2)
```

</center>

-   And as we can see, the results are identical.

-   I used this problem as an example of the potential and appeal of simulated annealing. It can be applied to other, harder to calculate, regression problems such as Nonlinear Least Squares Problems. I use this example to connect back to the class and reiterate the functionality of SA.

# Travelling Salesman Problem

## Column {.tabset data-width="400"}

### The Problem Itself

-   The TSP aims to answer the following : "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city"

-   The problem is NP-hard, meaning that it is computationally difficult to find an exact solution for large sets of cities. However, there are approximate algorithms that can be used to fine near-optimal solutions & is often used as a benchmark for optimization algorithms.

-   The problem itself may not seem very applicable, but it has many real-world applications, such as

    -   Vehicle routing: deliveries, etc.
    -   Bioinformatics \~ used to compare the evolutionary distance between various speciies by comparing the order of genes in the genome
    -   etc.

-   The problem is a very old and famous problem... and can essentially be boiled down in the following graphic :

<center><img src="https://www.pngfind.com/pngs/m/528-5289762_open-travelling-salesman-problem-solution-hd-png-download.png" width="33%" height="33%"/></center>

### Objective function & neighboring function

#### Objective func

-   Before diving into the objective func, I load the data into the following structure :

-   In this case, I load uniformly, randomly distributed random variables x & y

```{=html}
<!-- -->
```
    # Number of cities
    n <- 20

    # Max x & y coordinates
    max_x <- 1
    max_y <- 1

    # load df and create a distance matrix
    cities <- data.frame(id = 1:n, x = runif(n, max = max_x), y = runif(n, max = max_y))

-   The objective is relatively simple in this case, as it is simply the eucledian distance function. It can be implemented as the following

```{=html}
<!-- -->
```
    distance_func <- function(x1, x2) {
      return(sqrt( (cities$x[x1] - cities$x[x2])^2 + (cities$y[x1] - cities$y[x2])^2 ) ) 
    }


    # Minimizing function
    obj_func <- function(path) {
      total_d = 0
      # where path is the current route
      for (i in 1:n ) {
        if (i <= n - 1) {
          total_d = total_d + distance_func(path[i], path[i +1] )                      
        } else{
          total_d = total_d + distance_func(path[i], path[1])
        } 
      }
    return(total_d)
    }

-   This is not the most efficient way for R to compute the distance and to speed it up, in following problems a distance_matrix will be used. Then we can call distance_matrix[i,j] and it will return an already computed distance between cities i and j. It is faster and better situated for the problem.

#### Neighborhood-Function

-   In summary, a neighborhood function generates new solutions that are similar to the current solution but slightly different. These are then used to explore the search space of the optimization algorithm.
-   In the the context of the Traveling Salesman Problem, a neighboring function generates new solutions by swapping n cities in the current route. n can be adjust and is typically increased when the problem increases in the number of cities.
-   In our examples, I will use n=2 and thus swapping two cities in the current route (excluding the origin city)

```{=html}
<!-- -->
```
    neighbourhood = function(list1){
      random = sample(c(2:length(cities$id)), 2)
      i = random[1]
      j = random[2]
      
      # swap city IDs if and only if they are different
      if (list1[i] != list1[j]) {
        list1[c(i,j)] = list1[c(j,i)] 
      }
      
    return(list1)
    }

#### Parallelization

-   Parallelization is the process of breaking down a computational task into smaller chunks that can be executed simultaneously by multiple cores or processors, thereby increasing computational power.

-   As I only have a laptop with 4 processors, the parallelization did not dramatically change my run time but I implemented in the hopes of a new PC showing up on my doorstep..... please?

-   I implemented it using the function 'mclapply()' in the parallel package, and the following is the simulated_annealing function :

```{=html}
<!-- -->
```


    simulated_annealing <- function(func, temp, final_temp, origin_city_id, alpha, N) {
      
      # create a random path and then switch the initial index, with origin_city_id
      current_soln <- sample(c(1:n), n)
      current_soln[c(1, match(origin_city_id, current_soln))] <- current_soln[c(match(origin_city_id, current_soln), 1)] 
      
      best_eval <- func(current_soln)
      number_of_loops <- 0
      evals <- c()
      
      while (temp > final_temp) {
        # Set up a cluster with the desired number of cores
        cl <- makeCluster(detectCores(2))
        best_param = current_soln
        
        
        #Export the following to the cluster (allows worker nodes to find them)
        clusterExport(cl, 'distance_matrix')
        clusterExport(cl, 'n')
        clusterExport(cl, 'neighbourhood')
        
        # Use parLapply to parallelize the inner loop
        result <- parLapply(cl, 1:N, function(i) {
          random_soln <- neighbourhood(current_soln)
          delta <- func(random_soln) - func(current_soln) 
          
          if (delta < 0) {
            current_soln <- random_soln
          } else {
            p <- exp(-1 * delta / temp)
            r <- runif(1, 0, 1)
            if (r < p) {
              current_soln <- random_soln
            }
          }
          
          if (abs(func(current_soln)) < abs(best_eval)) {
            best_temp <- temp
            iter_num <- i
            best_param <- current_soln
            best_eval <- func(best_param)
          }
          
          # Return the current solution at the end of each iteration
          return(list(current_soln = current_soln, best_param = best_param, best_eval = best_eval))
        })
        
        # Update the current solution with the solution from the last iteration
        current_soln <- result[[length(result)]]$current_soln
        best_param <- result[[length(result)]]$best_param
        best_eval <- result[[length(result)]]$best_eval
        iter_num  <- result[[length(result)]]$iter_num
        
        
        
        # Stop the cluster and release the resources
        stopCluster(cl)
        
        
        
        # update temp
        cat("Current solution: ", current_soln, "\n")
        cat("Current evaluation function value: ", func(current_soln), "\n")
        print(paste("The current temp is : ", round(temp)))
        temp <- temp * alpha
        number_of_loops <- number_of_loops + 1
      }
      return(list("Parameters" = best_param, "fval" = best_eval, 'origin_city' = origin_city_id))
      
    }

### 10 Cities

- In the following example, I will use randomly generated city locations. Namely, using randomly, uniformly generated data points. Namely, the problem structure solves the follow :

```
n <- 10
max_x <- 1
max_y <- 1

# load df and create a distance matrix
cities <- data.frame(id = 1:n, x = runif(n, max = max_x), y = runif(n, max = max_y))

```

- A data set with larger n will be analyzed as well, but this is good to check functionality and debug.  The actual problem structure and code can be found in *'TSP SA Algorithm'*, that contains the exact code.

#### Results 

<center>
```{r}
n <- 10
max_x <- 1
max_y <- 1

# load df and create a distance matrix
cities <- data.frame(id = 1:n, x = runif(n, max = max_x), y = runif(n, max = max_y))


neighbourhood = function(list1){
  random = sample(c(2:length(cities$id)), 2)
  i = random[1]
  j = random[2]
  
  # swap city IDs only if they are different
  if (list1[i] != list1[j]) {
    list1[c(i,j)] = list1[c(j,i)] 
  }
  
return(list1)
}

distance_func <- function(x1, x2) {
  return(sqrt( (cities$x[x1] - cities$x[x2])^2 + (cities$y[x1] - cities$y[x2])^2 ) ) 
}


# Minimizing function
obj_func <- function(path) {
  total_d = 0
  # where path is the current route
  for (i in 1:n ) {
    if (i <= n - 1) {
      total_d = total_d + distance_func(path[i], path[i +1] )                      
    } else{
      total_d = total_d + distance_func(path[i], path[1])
    } 
  }
return(total_d)
}


simulated_annealing <- function(func, temp, final_temp, origin_city_id, alpha, N) {
  
  # create a random path and then switch the initial index, with origin_city_id
  current_soln <- sample(c(1:length(cities$id)), length(cities$id))
  current_soln[c(1, match(origin_city_id, current_soln))] <- current_soln[c(match(origin_city_id, current_soln), 1)] 
  
  best_eval <- func(current_soln)
  number_of_loops <- 0
  evals <- c()
  
  while (temp > final_temp) {
    for (i in 1:N) {
      random_soln <- neighbourhood(current_soln)
      delta <- func(random_soln) - func(current_soln) 

      if (delta < 0) {
        current_soln <- random_soln
      } else {
        p <- exp(-1 * delta / temp)
        r <- runif(1, 0, 1)
        if (r < p) {
          current_soln <- random_soln
        }
      }

      if (abs(func(current_soln)) < abs(best_eval)) {
        best_temp <- temp
        iter_num <- i
        best_param <- current_soln
        best_eval <- func(best_param)
      }
    } # end of inner loop

    # update temp
    temp <- temp * alpha
    number_of_loops <- number_of_loops + 1
  }
  return(list("Parameters" = best_param, "fval" = best_eval, "Iteration" = iter_num, "Temperature" = best_temp, "loop_number" = number_of_loops))
}

path = simulated_annealing(func = obj_func, temp = 20000, final_temp = 1, origin_city_id = 3, alpha = .8, N = 10000)

# Plotting
cities_path_x = rep(NA, length(cities$id))
cities_path_y = rep(NA, length(cities$id))

for(i in 1:length(cities$id)){
  cities_path_x[i] = cities$x[path$Parameters[i]]
  cities_path_y[i] = cities$y[path$Parameters[i]]
  
}
cities_path_x = append(cities_path_x, cities_path_x[1])
cities_path_y = append(cities_path_y, cities_path_y[1])

par(bg = 'black', col.axis="white",col.lab="white")

# Plot the path
plot(cities_path_x, cities_path_y, type ='l', main = 'Simulated annealing', xlab = 'Horizontal Distance', ylab = 'Vertical Distance',
     col = c('orange',rep('white', 10)), pch = 3)
text(cities$x, cities$y, cities$id, adj = c(1,1), col = "white")
title(main = "Simulated annealing", col.main = "white", xlab = "Horizontal Distance", ylab = "Vertical Distance", col.lab = "white")
grid(col = "white")
points(cities$x, cities$y, pch = 3,col = "white")

```
</center>

- As we can see, it looks as though SA has chosen the path correctly.


### Larger problems & Complications

- For larger problems, such as n > 20, my algorithm does not perform well. As mentioned before, there are many tweakable, problem-dependent components such as :
    1.  Initial Temperature
    2.  Cooling Schedule
    3. Acceptance Probability
    4.  Number of Iterations, N
    5. Distance function (Euclidean, Manhattan, Hamming, etc.)
    6. Neighborhood function and Initial Solution

- Personally, I believe 6) to be the biggest and, and with my limited computing power, most influential on the accuracy of the solution.  Therefore, I will be continuing on with customizing 6), noting the importance of the tweaking and adjustments of all the rest 1-5.

#### Neighborhood Function and Initial Solution


- Some possibilities for the neighborhood include : 
     - swap (which I have used up to this point)
     - Insert techniques
     - Scramble techniques
     - Inversion
     - etc.
     
     
- Here is an example of some methods that are worth exploring :



<center>![](C:/Users/reeds/Pictures/Four-neighborhood-operations.jpg)</center>

- Due to time constraints, this is where my project will have to stop. I plan to pick it up and explore other neighborhood functions, as the accuracy for more difficult problems is not where I want it to be.








### TSP, SA Algorithm

```
simulated_annealing <- function(func, temp, final_temp, origin_city_id, alpha, N) {
  
  # create a random path and then switch the initial index, with origin_city_id
  
  current_soln <- sample(c(1:length(cities$id)), length(cities$id))
  current_soln[c(1, match(origin_city_id, current_soln))] <- current_soln[c(match(origin_city_id, current_soln), 1)] 
  
  best_eval <- func(current_soln)
  number_of_loops <- 0
  evals <- c()
  
  while (temp > final_temp) {
    for (i in 1:N) {
      random_soln <- neighbourhood(current_soln)
      delta <- func(random_soln) - func(current_soln) 

      if (delta < 0) {
        current_soln <- random_soln
      } else {
        p <- exp(-1 * delta / temp)
        r <- runif(1, 0, 1)
        if (r < p) {
          current_soln <- random_soln
        }
      }

      if (abs(func(current_soln)) < abs(best_eval)) {
        best_temp <- temp
        iter_num <- i
        best_param <- current_soln
        best_eval <- func(best_param)
      }
    } # end of inner loop

    # update temp
    cat("Current solution: ", current_soln, "\n")
    cat("Current evaluation function value: ", func(current_soln), "\n")
    print(paste("The current temp is : ", round(temp)))
    temp <- temp * alpha
    number_of_loops <- number_of_loops + 1
  }
  return(list("Parameters" = best_param, "fval" = best_eval, "Iteration" = iter_num, "Temperature" = best_temp, "loop_number" = number_of_loops))
}



```

# About Me

- Hello and thank you for taking the time to learn about my project. I am thrilled with the final outcome and am proud of the hard work and effort that went into it. I am excited to continue building upon it and exploring new possibilities, as it is not quite polished.

- A little bit about myself, I am currently a Senior at the University of Dayton, double majoring in Mathematics and Psychology. Mathematics has always been a subject that I have found to be incredibly fascinating and has been a passion of mine for a long time. I picked up Psychology as an additional major, it is more of a hobby for me that I enjoy reading and learning more about human behavior. I am excited to see how these two diverse fields can complement each other and how they can be applied in real-world scenarios.

- In the future, I would like to work as a data analyst or data scientists with the possibility of graduate school studying statistics or data science.   Currently, I am deeply interested in the field of machine learning and its potential to revolutionize various industries. My passion for this field was ignited as this class was my first real introduction to machine learning, where I was able to grasp the fundamental concepts and saw the potential for its application in solving real-world problems. I am eager to continue learning and exploring the possibilities in this field. 

- This project presented an exciting opportunity for me to challenge my skills and expand my knowledge in HTML (I am typically horrible at front-end development). Although I had previous experience with HTML during my high school years, it was a foreign concept to me. The process of starting from scratch and gradually building upon my understanding was both challenging and enjoyable. I take pride in the progress I have made and the final outcome of the project. I hope you have found my project to be engaging and informative. Please do not hesitate to reach out to me with any questions or feedback.!

- If you have any questions or would like to reach out you can do so and here is some additional information on me: 
    - Email : reedshay00@gmail.com
    - And [My Resume](JobResume.pdf) here.
 
<center>
  <img src="C:/Users/reeds/Pictures/Headshot.JPG" width="200" height="300">
</center>